Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
RSC Adv ; 14(19): 13083-13094, 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38655474

RESUMO

The solute carrier transporter family 6 (SLC6) is of key interest for their critical role in the transport of small amino acids or amino acid-like molecules. Their dysfunction is strongly associated with human diseases such as including schizophrenia, depression, and Parkinson's disease. Linking single point mutations to disease may support insights into the structure-function relationship of these transporters. This work aimed to develop a computational model for predicting the potential pathogenic effect of single point mutations in the SLC6 family. Missense mutation data was retrieved from UniProt, LitVar, and ClinVar, covering multiple protein-coding transcripts. As encoding approach, amino acid descriptors were used to calculate the average sequence properties for both original and mutated sequences. In addition to the full-sequence calculation, the sequences were cut into twelve domains. The domains are defined according to the transmembrane domains of the SLC6 transporters to analyse the regions' contributions to the pathogenicity prediction. Subsequently, several classification models, namely Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) with the hyperparameters optimized through grid search were built. For estimation of model performance, repeated stratified k-fold cross-validation was used. The accuracy values of the generated models are in the range of 0.72 to 0.80. Analysis of feature importance indicates that mutations in distinct regions of SLC6 transporters are associated with an increased risk for pathogenicity. When applying the model on an independent validation set, the performance in accuracy dropped to averagely 0.6 with high precision but low sensitivity scores.

2.
Plant Sci ; 339: 111931, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38030036

RESUMO

Iron is an essential micronutrient for life. During the development of the seed, iron accumulates during embryo maturation. In Arabidopsis thaliana, iron mainly accumulates in the vacuoles of only one cell type, the cell layer that surrounds provasculature in hypocotyl and cotyledons. Iron accumulation pattern in Arabidopsis is an exception in plant phylogeny, most part of the dicot embryos accumulate iron in several cell layers including cortex and, in some cases, even in protodermis. It remains unknown how does iron reach the internal cell layers of the embryo, and in particular, the molecular mechanisms responsible of this process. Here, we use transgenic approaches to modify the iron accumulation pattern in an Arabidopsis model. Using the SDH2-3 embryo-specific promoter, we were able to express VIT1 ectopically in both a wild type background and a mutant vit1 background lacking expression of this vacuolar iron transporter. These manipulations modify the iron distribution pattern in Arabidopsis from one cell layer to several cell layers, including protodermis, cortex cells, and the endodermis. Interestingly, total seed iron content was not modified compared with the wild type, suggesting that iron distribution in embryos is not involved in the control of the total iron amount accumulated in seeds. This experimental model can be used to study the processes involved in iron distribution patterning during embryo maturation and its evolution in dicot plants.


Assuntos
Proteínas de Arabidopsis , Arabidopsis , Arabidopsis/metabolismo , Ferro/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Regiões Promotoras Genéticas/genética , Sementes/metabolismo , Regulação da Expressão Gênica de Plantas
3.
J Mol Biol ; 436(2): 168383, 2024 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-38070861

RESUMO

Creatine is an essential metabolite for the storage and rapid supply of energy in muscle and nerve cells. In humans, impaired metabolism, transport, and distribution of creatine throughout tissues can cause varying forms of mental disability, also known as creatine deficiency syndrome (CDS). So far, 80 mutations in the creatine transporter (SLC6A8) have been associated to CDS. To better understand the effect of human genetic variants on the physiology of SLC6A8 and their possible impact on CDS, we studied 30 missense variants including 15 variants of unknown significance, two of which are reported here for the first time. We expressed these variants in HEK293 cells and explored their subcellular localization and transport activity. We also applied computational methods to predict variant effect and estimate site-specific changes in thermodynamic stability. To explore variants that might have a differential effect on the transporter's conformers along the transport cycle, we constructed homology models of the inward facing, and outward facing conformations. In addition, we used mass-spectrometry to study proteins that interact with wild type SLC6A8 and five selected variants in HEK293 cells. In silico models of the protein complexes revealed how two variants impact the interaction interface of SLC6A8 with other proteins and how pathogenic variants lead to an enrichment of ER protein partners. Overall, our integrated analysis disambiguates the pathogenicity of 15 variants of unknown significance revealing diverse mechanisms of pathogenicity, including two previously unreported variants obtained from patients suffering from the creatine deficiency syndrome.


Assuntos
Encefalopatias Metabólicas Congênitas , Creatina , Retardo Mental Ligado ao Cromossomo X , Proteínas do Tecido Nervoso , Proteínas da Membrana Plasmática de Transporte de Neurotransmissores , Humanos , Creatina/deficiência , Células HEK293 , Retardo Mental Ligado ao Cromossomo X/genética , Proteínas do Tecido Nervoso/deficiência , Proteínas do Tecido Nervoso/genética , Proteínas da Membrana Plasmática de Transporte de Neurotransmissores/deficiência , Proteínas da Membrana Plasmática de Transporte de Neurotransmissores/genética , Encefalopatias Metabólicas Congênitas/genética , Análise Mutacional de DNA/métodos , Mutação de Sentido Incorreto , Biologia Computacional/métodos
4.
ACS Chem Biol ; 18(12): 2464-2473, 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-38098458

RESUMO

Molecular glue degraders (MGDs) are small molecules that degrade proteins of interest via the ubiquitin-proteasome system. While MGDs were historically discovered serendipitously, approaches for MGD discovery now include cell-viability-based drug screens or data mining of public transcriptomics and drug response datasets. These approaches, however, have target spaces restricted to the essential proteins. Here we develop a high-throughput workflow for MGD discovery that also reaches the nonessential proteome. This workflow begins with the rapid synthesis of a compound library by sulfur(VI) fluoride exchange chemistry coupled to a morphological profiling assay in isogenic cell lines that vary in levels of the E3 ligase CRBN. By comparing the morphological changes induced by compound treatment across the isogenic cell lines, we were able to identify FL2-14 as a CRBN-dependent MGD targeting the nonessential protein GSPT2. We envision that this workflow would contribute to the discovery and characterization of MGDs that target a wider range of proteins.


Assuntos
Complexo de Endopeptidases do Proteassoma , Ubiquitina-Proteína Ligases , Proteólise , Complexo de Endopeptidases do Proteassoma/metabolismo , Ubiquitina-Proteína Ligases/metabolismo , Proteínas/metabolismo , Ubiquitina/metabolismo
5.
Nature ; 2023 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-37277473
6.
Biometals ; 36(1): 227-237, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36454509

RESUMO

Zinc is the second most prevalent metal element present in living organisms, and control of its concentration is pivotal to physiology. The amount of zinc available to the cell cytoplasm is regulated by the activity of members of the SLC39 family, the ZIP proteins. Selectivity of ZIP transporters has been the focus of earlier studies which provided a biochemical and structural basis for the selectivity for zinc over other metals such as copper, iron, and manganese. However, several previous studies have shown how certain ZIP proteins exhibit higher selectivity for metal elements other than zinc. Sequence similarities suggest an evolutionary basis for the elemental selectivity within the ZIP family. Here, by engineering HEK293 cells to overexpress ZIP proteins, we have studied the selectivity of two phylogenetic clades of ZIP proteins, that is ZIP8/ZIP14 (previously known to be iron and manganese transporters) and ZIP5/ZIP10. By incubating ZIP over-expressing cells in presence of several divalent metals, we found that ZIP5 and ZIP10 are high affinity copper transporters with greater selectivity over other elements, revealing a novel substrate signature for the ZIP5/ZIP10 clade.


Assuntos
Cobre , Manganês , Humanos , Cobre/metabolismo , Células HEK293 , Ferro/metabolismo , Manganês/metabolismo , Proteínas de Membrana Transportadoras , Metais/metabolismo , Filogenia , Zinco/metabolismo
7.
iScience ; 25(10): 105096, 2022 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-36164651

RESUMO

Solute carriers are an operationally defined diverse family of membrane proteins involved in the transport of nutrients, metabolites, xenobiotics, and drugs. Here, we provide an integrative classification of solute carriers by combining evolutionary information with proteome-wide structure models recently made available through the AlphaFold resource. Analyses of orthologous relations among 455 protein-coding genes currently classified as human solute carriers, over the fully sequenced genomes of 2,100 species, suggest no more than approximately 180 independent evolutionary origins. Structural comparative analyses provided further insight revealing a total of 24 structurally distinct transmembrane folds, increasing by approximately 40% the number of previously described SLC structural folds. In addition, a structural comparative analysis identified a new human solute carrier member and revealed details of noncanonical ones. Our analyses uncover new ancestral relations between solute carrier genes, provide insights into the evolution of remote homologs and a platform to test hypotheses of functional deorphanization.

8.
J Mol Evol ; 89(6): 357-369, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33934169

RESUMO

We use large-scale mutagenesis data and computer simulations to quantify the mutational robustness of protein-coding genes by taking into account constraints arising from protein function and the genetic code. Analyses of the distribution of amino acid substitutions from 18 mutagenesis studies revealed an average of 45% of neutral variants; while mutagenesis data of 12 proteins artificially designed under no other constraints but stability, reach an average of 60%. Simulations using a lattice protein model allow us to contrast these estimates to the expected mutational robustness of protein families by generating unbiased samples of foldable sequences, which we find to have 30% of neutral variants. In agreement with mutagenesis data of designed proteins, the model shows that maximally robust protein families might access up to twice the amount of neutral variants observed in the unbiased samples (i.e. 60%). A biophysical model of protein-ligand binding suggests that constraints associated to molecular function have only a moderate impact on robustness of approximately 5 to 10% of neutral variants; and that the direction of this effect depends on the relation between functional performance and thermodynamic stability. Although the genetic code constraints the access of a gene's nucleotide sequence to only 30% of the full distribution of amino acid mutations, it provides an extra 15 to 20% of neutral variants to the estimations above, such that the expected, observed, and maximal robustness of protein-coding genes are approximately 50, 65, and 75%, respectively. We discuss our results in the light of three main hypothesis put forward to explain the existence of mutationally robust genes.


Assuntos
Código Genético , Proteínas , Humanos , Modelos Genéticos , Mutagênese , Mutação , Proteínas/genética , Termodinâmica
9.
Evol Bioinform Online ; 15: 1176934319870485, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31452598

RESUMO

In order to preserve structure and function, proteins tend to preferentially conserve amino acids at particular sites along the sequence. Because mutations can affect structure and function, the question arises whether the preference of a protein site for a particular amino acid varies between protein homologs, and to what extent that variation depends on sequence divergence. Answering these questions can help in the development of models of sequence evolution, as well as provide insights on the dependence of the fitness effects of mutations on the genetic background of sequences, a phenomenon known as epistasis. Here, I comment on recent computational work providing a systematic analysis of the extent to which the amino acid preferences of proteins depend on the background mutations of protein homologs.

10.
Genes (Basel) ; 10(5)2019 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-31035578

RESUMO

More than a decade ago, a new mitochondrial Open Reading Frame (mtORF) was discovered in corals of the family Pocilloporidae and has been used since then as an effective barcode for these corals. Recently, mtORF sequencing revealed the existence of two differentiated Stylophora lineages occurring in sympatry along the environmental gradient of the Red Sea (18.5°C to 33.9°C). In the endemic Red Sea lineage RS_LinB, the mtORF and the heat shock protein gene hsp70 uncovered similar phylogeographic patterns strongly correlated with environmental variations. This suggests that the mtORF too might be involved in thermal adaptation. Here, we used computational analyses to explore the features and putative function of this mtORF. In particular, we tested the likelihood that this gene encodes a functional protein and whether it may play a role in adaptation. Analyses of full mitogenomes showed that the mtORF originated in the common ancestor of Madracis and other pocilloporids, and that it encodes a transmembrane protein differing in length and domain architecture among genera. Homology-based annotation and the relative conservation of metal-binding sites revealed traces of an ancient hydrolase catalytic activity. Furthermore, signals of pervasive purifying selection, lack of stop codons in 1830 sequences analyzed, and a codon-usage bias similar to that of other mitochondrial genes indicate that the protein is functional, i.e., not a pseudogene. Other features, such as intrinsically disordered regions, tandem repeats, and signals of positive selection particularly in StylophoraRS_LinB populations, are consistent with a role of the mtORF in adaptive responses to environmental changes.


Assuntos
Antozoários/genética , Biologia Computacional , DNA Mitocondrial/genética , Mitocôndrias/genética , Animais , Ecossistema , Oceano Índico , Fases de Leitura Aberta/genética , Filogenia , Filogeografia , Conformação Proteica , Sequências de Repetição em Tandem/genética
11.
Genome Biol Evol ; 11(1): 121-135, 2019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30496400

RESUMO

The propensity of protein sites to be occupied by any of the 20 amino acids is known as site-specific amino acid preferences (SSAP). Under the assumption that SSAP are conserved among homologs, they can be used to parameterize evolutionary models for the reconstruction of accurate phylogenetic trees. However, simulations and experimental studies have not been able to fully assess the relative conservation of SSAP as a function of sequence divergence between protein homologs. Here, we implement a computational procedure to predict the SSAP of proteins based on the effect of changes in thermodynamic stability upon mutation. An advantage of this computational approach is that it allows us to interrogate a large and unbiased sample of homologous proteins, over the entire spectrum of sequence divergence, and under selection for the same molecular trait. We show that computational predictions have reproducibilities that resemble those obtained in experimental replicates, and can largely recapitulate the SSAP observed in a large-scale mutagenesis experiment. Our results support recent experimental reports on the conservation of SSAP of related homologs, with a slowly increasing fraction of up to 15% of different sites at sequence distances lower than 40%. However, even under the sole contribution of thermodynamic stability, our conservative approach identifies up to 30% of significant different sites between divergent homologs. We show that this relation holds for homologs of diverse sizes and structural classes. Analyses of residue contact networks suggest that an important determinant of these differences is the increasing accumulation of structural deviations that results from sequence divergence.


Assuntos
Substituição de Aminoácidos , Modelos Genéticos , Estabilidade Proteica , Homologia de Sequência de Aminoácidos
12.
J Biol Chem ; 292(45): 18518-18529, 2017 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-28939764

RESUMO

Stringent regulation of tyrosine kinase activity is essential for normal cellular function. In humans, the tyrosine kinase Src is inhibited via phosphorylation of its C-terminal tail by another kinase, C-terminal Src kinase (Csk). Although Src and Csk orthologs are present across holozoan organisms, including animals and protists, the Csk-Src negative regulatory mechanism appears to have evolved gradually. For example, in choanoflagellates, Src and Csk are both active, but the negative regulatory mechanism is reportedly absent. In filastereans, a protist clade closely related to choanoflagellates, Src is active, but Csk is apparently inactive. In this study, we use a combination of bioinformatics, in vitro kinase assays, and yeast-based growth assays to characterize holozoan Src and Csk orthologs. We show that, despite appreciable differences in domain architecture, Csk from Corallochytrium limacisporum, a highly diverged holozoan marine protist, is active and can inhibit Src. However, in comparison with other Csk orthologs, Corallochytrium Csk displays broad substrate specificity and inhibits Src in an activity-independent manner. Furthermore, in contrast to previous studies, we show that Csk from the filasterean Capsaspora owczarzaki is active and that the Csk-Src negative regulatory mechanism is present in Csk and Src proteins from C. owczarzaki and the choanoflagellate Monosiga brevicollis Our results suggest that negative regulation of Src by Csk is more ancient than previously thought and that it might be conserved across all holozoan species.


Assuntos
Organismos Aquáticos/enzimologia , Coanoflagelados/enzimologia , Proteínas de Protozoários/metabolismo , Quinases da Família src/antagonistas & inibidores , Sequência de Aminoácidos , Substituição de Aminoácidos , Proteína Tirosina Quinase CSK , Biologia Computacional , Sequência Conservada , Cinética , Mutação , Filogenia , Domínios e Motivos de Interação entre Proteínas , Proteínas de Protozoários/antagonistas & inibidores , Proteínas de Protozoários/química , Proteínas de Protozoários/genética , Proteínas Recombinantes/química , Proteínas Recombinantes/metabolismo , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Especificidade da Espécie , Homologia Estrutural de Proteína , Especificidade por Substrato , Técnicas do Sistema de Duplo-Híbrido , Quinases da Família src/química , Quinases da Família src/genética , Quinases da Família src/metabolismo
13.
PLoS Comput Biol ; 10(12): e1003946, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25473967

RESUMO

The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet.


Assuntos
Sequência de Aminoácidos , Aminoácidos , Conformação Proteica , Proteínas , Aminoácidos/química , Aminoácidos/genética , Análise por Conglomerados , Biologia Computacional , Genótipo , Modelos Biológicos , Modelos Moleculares , Fenótipo , Dobramento de Proteína , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína
14.
J Mol Evol ; 78(2): 101-8, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24309994

RESUMO

The distribution of variation in a quantitative trait and its underlying distribution of genotypic diversity can both be shaped by stabilizing and directional selection. Understanding either distribution is important, because it determines a population's response to natural selection. Unfortunately, existing theory makes conflicting predictions about how selection shapes these distributions, and very little pertinent experimental evidence exists. Here we study a simple genetic system, an evolving RNA enzyme (ribozyme) in which a combination of high throughput genotyping and measurement of a biochemical phenotype allow us to address this question. We show that directional selection, compared to stabilizing selection, increases the genotypic diversity of an evolving ribozyme population. In contrast, it leaves the variance in the phenotypic trait unchanged.


Assuntos
Variação Genética , Genótipo , Fenótipo , RNA Catalítico/genética , RNA Catalítico/metabolismo , Seleção Genética , Azoarcus/genética , Azoarcus/metabolismo , Sequência de Bases , Evolução Molecular , Dados de Sequência Molecular , Conformação de Ácido Nucleico , RNA Catalítico/química
15.
Genome Biol Evol ; 5(5): 966-77, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23563968

RESUMO

Prokaryotic genomes are small and compact. Either this feature is caused by neutral evolution or by natural selection favoring small genomes-genome streamlining. Three separate prior lines of evidence argue against streamlining for most prokaryotes. We find that the same three lines of evidence argue for streamlining in the genomes of thermophile bacteria. Specifically, with increasing habitat temperature and decreasing genome size, the proportion of genomic DNA in intergenic regions decreases. Furthermore, with increasing habitat temperature, generation time decreases. Genome-wide selective constraints do not decrease as in the reduced genomes of host-associated species. Reduced habitat variability is not a likely explanation for the smaller genomes of thermophiles. Genome size may be an indirect target of selection due to its association with cell volume. We use metabolic modeling to demonstrate that known changes in cell structure and physiology at high temperature can provide a selective advantage to reduce cell volume at high temperatures.


Assuntos
Adaptação Fisiológica/genética , Bactérias/genética , Tamanho do Genoma , Genoma Bacteriano , Archaea/genética , Evolução Molecular , Genoma Arqueal , Temperatura Alta , Células Procarióticas , Seleção Genética
16.
Biophys J ; 102(8): 1916-25, 2012 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-22768948

RESUMO

The relationship between the genotype (sequence) and the phenotype (structure) of macromolecules affects their ability to evolve new structures and functions. We here compare the genotype space organization of proteins and RNA molecules to identify differences that may affect this ability. To this end, we computationally study the genotype-phenotype relationship for short RNA and lattice proteins of a reduced monomer alphabet size, to make exhaustive analysis and direct comparison of their genotype spaces feasible. We find that many fewer protein molecules than RNA molecules fold, but they fold into many more structures than RNA. In consequence, protein phenotypes have smaller genotype networks whose member genotypes tend to be more similar than for RNA phenotypes. Neighborhoods in sequence space of a given radius around an RNA molecule contain more novel structures than for protein molecules. We compare this property to evidence from natural RNA and protein molecules, and conclude that RNA genotype space may be more conducive to the evolution of new structure phenotypes.


Assuntos
Biologia Computacional , Genótipo , Fenótipo , Proteínas/química , Proteínas/metabolismo , RNA/química , RNA/genética , Conformação de Ácido Nucleico , Dobramento de Proteína
17.
Nature ; 474(7349): 92-5, 2011 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-21637259

RESUMO

Cryptic variation is caused by the robustness of phenotypes to mutations. Cryptic variation has no effect on phenotypes in a given genetic or environmental background, but it can have effects after mutations or environmental change. Because evolutionary adaptation by natural selection requires phenotypic variation, phenotypically revealed cryptic genetic variation may facilitate evolutionary adaptation. This is possible if the cryptic variation happens to be pre-adapted, or "exapted", to a new environment, and is thus advantageous once revealed. However, this facilitating role for cryptic variation has not been proven, partly because most pertinent work focuses on complex phenotypes of whole organisms whose genetic basis is incompletely understood. Here we show that populations of RNA enzymes with accumulated cryptic variation adapt more rapidly to a new substrate than a population without cryptic variation. A detailed analysis of our evolving RNA populations in genotype space shows that cryptic variation allows a population to explore new genotypes that become adaptive only in a new environment. Our observations show that cryptic variation contains new genotypes pre-adapted to a changed environment. Our results highlight the positive role that robustness and epistasis can have in adaptive evolution.


Assuntos
Adaptação Fisiológica/genética , Evolução Molecular , Variação Genética , RNA Catalítico/genética , RNA Catalítico/metabolismo , Azoarcus/enzimologia , Azoarcus/genética , Mutagênese , Fenótipo , Seleção Genética/genética
18.
PLoS One ; 5(11): e14172, 2010 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-21152394

RESUMO

The organization of protein structures in protein genotype space is well studied. The same does not hold for protein functions, whose organization is important to understand how novel protein functions can arise through blind evolutionary searches of sequence space. In systems other than proteins, two organizational features of genotype space facilitate phenotypic innovation. The first is that genotypes with the same phenotype form vast and connected genotype networks. The second is that different neighborhoods in this space contain different novel phenotypes. We here characterize the organization of enzymatic functions in protein genotype space, using a data set of more than 30,000 proteins with known structure and function. We show that different neighborhoods of genotype space contain proteins with very different functions. This property both facilitates evolutionary innovation through exploration of a genotype network, and it constrains the evolution of novel phenotypes. The phenotypic diversity of different neighborhoods is caused by the fact that some functions can be carried out by multiple structures. We show that the space of protein functions is not homogeneous, and different genotype neighborhoods tend to contain a different spectrum of functions, whose diversity increases with increasing distance of these neighborhoods in sequence space. Whether a protein with a given function can evolve specific new functions is thus determined by the protein's location in sequence space.


Assuntos
Evolução Molecular , Proteínas/genética , Proteínas/metabolismo , Algoritmos , Sítios de Ligação/genética , Biocatálise , Bases de Dados de Proteínas , Enzimas/química , Enzimas/metabolismo , Genótipo , Estrutura Terciária de Proteína , Proteínas/química , Especificidade por Substrato
19.
Protein Sci ; 18(7): 1469-85, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19530247

RESUMO

Empirical or knowledge-based potentials have many applications in structural biology such as the prediction of protein structure, protein-protein, and protein-ligand interactions and in the evaluation of stability for mutant proteins, the assessment of errors in experimentally solved structures, and the design of new proteins. Here, we describe a simple procedure to derive and use pairwise distance-dependent potentials that rely on the definition of effective atomic interactions, which attempt to capture interactions that are more likely to be physically relevant. Based on a difficult benchmark test composed of proteins with different secondary structure composition and representing many different folds, we show that the use of effective atomic interactions significantly improves the performance of potentials at discriminating between native and near-native conformations. We also found that, in agreement with previous reports, the potentials derived from the observed effective atomic interactions in native protein structures contain a larger amount of mutual information. A detailed analysis of the effective energy functions shows that atom connectivity effects, which mostly arise when deriving the potential by the incorporation of those indirect atomic interactions occurring beyond the first atomic shell, are clearly filtered out. The shape of the energy functions for direct atomic interactions representing hydrogen bonding and disulfide and salt bridges formation is almost unaffected when effective interactions are taken into account. On the contrary, the shape of the energy functions for indirect atom interactions (i.e., those describing the interaction between two atoms bound to a direct interacting pair) is clearly different when effective interactions are considered. Effective energy functions for indirect interacting atom pairs are not influenced by the shape or the energy minimum observed for the corresponding direct interacting atom pair. Our results suggest that the dependency between the signals in different energy functions is a key aspect that need to be addressed when empirical energy functions are derived and used, and also highlight the importance of additivity assumptions in the use of potential energy functions.


Assuntos
Pesquisa Empírica , Modelos Químicos , Proteínas/química , Algoritmos , Dissulfetos , Ligação de Hidrogênio , Interações Hidrofóbicas e Hidrofílicas , Conhecimento , Modelos Moleculares , Conformação Proteica , Curva ROC , Termodinâmica
20.
BMC Bioinformatics ; 9: 265, 2008 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-18534022

RESUMO

BACKGROUND: As in many different areas of science and technology, most important problems in bioinformatics rely on the proper development and assessment of binary classifiers. A generalized assessment of the performance of binary classifiers is typically carried out through the analysis of their receiver operating characteristic (ROC) curves. The area under the ROC curve (AUC) constitutes a popular indicator of the performance of a binary classifier. However, the assessment of the statistical significance of the difference between any two classifiers based on this measure is not a straightforward task, since not many freely available tools exist. Most existing software is either not free, difficult to use or not easy to automate when a comparative assessment of the performance of many binary classifiers is intended. This constitutes the typical scenario for the optimization of parameters when developing new classifiers and also for their performance validation through the comparison to previous art. RESULTS: In this work we describe and release new software to assess the statistical significance of the observed difference between the AUCs of any two classifiers for a common task estimated from paired data or unpaired balanced data. The software is able to perform a pairwise comparison of many classifiers in a single run, without requiring any expert or advanced knowledge to use it. The software relies on a non-parametric test for the difference of the AUCs that accounts for the correlation of the ROC curves. The results are displayed graphically and can be easily customized by the user. A human-readable report is generated and the complete data resulting from the analysis are also available for download, which can be used for further analysis with other software. The software is released as a web server that can be used in any client platform and also as a standalone application for the Linux operating system. CONCLUSION: A new software for the statistical comparison of ROC curves is released here as a web server and also as standalone software for the LINUX operating system.


Assuntos
Algoritmos , Interpretação Estatística de Dados , Diagnóstico por Computador/métodos , Curva ROC , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...